The concept of the variable allows us to quantify various aspects of our observations.
nominal/categorical: These variables have a limited number of levels which cannot be ordered in a meaningful way. For instance, it does not matter which value of SUBORDTYPE or MORETHAN2CL comes first or last:
unique(cl.order$SUBORDTYPE)
[1] "temp" "caus"
unique(cl.order$MORETHAN2CL)
[1] "no" "yes"
ordinal: Such variables can be ordered, but the intervals between their individuals values are not meaningful. Heumann (2022: 6) provides a pertinent example:
“[T]he satisfaction with a product (unsatisfied–satisfied–very satisfied) is an ordinal variable because the values this variable can take can be ordered but the differences between ‘unsatisfied–satisfied’ and ‘satisfied–very satisfied’ cannot be compared in a numerical way”.
In the case of interval-scaled variables, the differences between the values can be interpreted, but their ratios must be treated with caution. A temperature of 4°C is 6 degrees warmer than -2°C; however, this does not imply that 4°C is three times warmer than -2°C. This is because the temperature scale has no true zero point; 0°C simply signifies another point on the scale and not the absence of temperature altogether.
Ratio-scaled variables allow both a meaningful interpretation of the differences between their values and (!) of the ratios between them. Within the context of clause length, LENGTH_DIFF values such as 4 and 8 not only suggest that the latter is four units greater than the former but also that their ratio \(\frac{8}{4} = 2\) is a valid way to describe the relationship between these values. Here a LENGTH_DIFF of 0 can be clearly viewed as the absence of a length difference.
3.3 Introduction to ggplot2
3.3.1 Building a ggplot
A ggplot requires at minimum three elements: (1) a data frame, (2) axis labels, and (3) a plotting option (also known as “geom”). We combine them with the + sign.
# Supply data frameggplot(data = cl.order,# Supply axis labelsmapping =aes(x = LEN_MC, y = LEN_SC)) +# Set plotting option (here: scatterplot)geom_point()
3.3.2 Adding layers
Visualise a third variable using the colors argument as part of the aes() function.
ggplot(data = cl.order,mapping =aes(x = LEN_MC, y = LEN_SC)) +1geom_point(aes(color = ORDER, shape = SUBORDTYPE)) +labs(2title ="Length of main and subordinate clauses",subtitle ="Dimensions for different ordering types",x ="Length of main clause",y ="Length of subordinate clause",color ="ORDER",shape ="SUBORDTYPE" ) +3theme_classic()
1
Map variables to axes, colours and shapes.
2
Add a legend with a title, subtitle and axis labels.
Save last plot displayed in the viewer to your working directory:
ggplot(cl.order, aes(x = LEN_MC, y = LEN_SC)) +geom_point()ggsave("figures/clause_length_plot.png")
Heumann, Christian, Michael Schomaker, and Shalabh. 2022. Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in r. 2nd ed. Cham: Springer. https://doi.org/10.1007/978-3-031-11833-3.
Paquot, Magali, and Tove Larsson. 2020. “Descriptive Statistics and Visualization with r.” In A Practical Handbook of Corpus Linguistics, edited by Magali Paquot and Stefan Thomas Gries, 375–99. Cham: Springer.